perm filename VIS[0,BGB]3 blob sn#069838 filedate 1973-11-05 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00016 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	2.0	Computer Vision Theory.
C00005 00003		(Vision task).   For me, the overall computer  vision problem
C00008 00004		(turn table task). The turn table task in to construct
C00009 00005		(Vision  systems).  The  structure  of  any  computer  vision
C00012 00006		(Bottom: the nature  of images). There are three  basic kinds
C00015 00007		(recognition).   Recognition   involves   comparing
C00016 00008		(locus  solving).  The  crux   of  computer  vision
C00018 00009	(Computer Vision and Artificial Intelligence).
C00021 00010		(Intellectual  Entities).  The larger  context  of  a  vision
C00023 00011	(Fiegenbaum Quote).
C00026 00012	The Vision Transducer.
C00028 00013	Bottom: The Nature of Images.
C00030 00014	Locus Solving.
C00032 00015	Top: The Nature of Worlds.
C00034 00016	Related Vision Work.
C00037 ENDMK
CāŠ—;
2.0	Computer Vision Theory.


2.1	Introduction to Computer Vision Theory.

	In this chapter,   two theories are interleaved.   There is a
grand  theory, which  is my  interpretation of  the overall  state of
computer vision; and  there is a  petit theory,   which has  inspired
this work.  The  word "theory", as used here,  means  simply a set of
statements  presenting  a systematic  view of  a  subject. I  wish to
exclude the connotations that the theory is a  mathematical theory or
a  natural  theory.    Perhaps  there  can  be such  a  thing  as  an
"artificial theory" that lies between  the philosophy and the  design
of computer vision.  The rest of  this introduction is a  synopsis of
the two  theories which consist of three  pairings of six parts: task
& system; bottom & top; recognition & discription.

	(Vision task).   For me, the overall computer  vision problem
is  to write  a general  purpose program  that can  see and  act with
respect  to  the  real  physical  world.    The  interest   of  other
researchers  in  modeling  human  perception,   in  participating  in
traditional  philosophical arguments,  in  solving puzzle problems or
in  developing advanced  automation  techniques  must  constantly  be
taken into account when discussing computer vision.

	(cart   task). Given a computer controlled cart,
explore and map the world.

	(Cart  Hardware  Discription).  The  cart   at  the  Stanford
Artificial  Intelligence Laboratory is intended  for outdoors use and
consists of  four bicycle  wheels,   a  piece of  plywood,   two  car
battiers,   a television camera,   a  television transmitter,   and a
toy  airplane radio  receiver.  (The  vehicle being  discussed is not
"Shakey",    which  belongs  to  the   Stanford  Reseach  Institute's
Artificial  Intelligence Group.   There  are two  "Stanford-ish" A.I.
Labs and each has a computer controlled vehicle.) Logically the  cart
has three motors  which can be commanded to  run in one or  the other
direction  under  computer control.    The six  possible  cart action
commands are: run forwards, run backwards, steer to the left,   steer
to the right, pan camera to the left,  pan camera to the right. 
	(turn table task). The turn table task in to construct
a 3-D model from a sequence of 2-D television images taken
of an object rotated on a turn table.

	(block tasks). The classic block vision task, dating from
Roberts, consists of two parts: first convert a video image
into a line drawing; second, find a selection of
prototype blocks that account for the line drawing.

[single image vs. multiple images].
[perfect line drawing puzzles: Guzman & Waltz].
[imperfect line drawing analysis]

	(Recognition tasks).
	(Vision  systems).  The  structure  of  any  computer  vision
system can be  expressed as a transducer between perceived images and
a world model.   The two  poles of the  vision transducer are  called
"bottom" for images and "top" for  models. Although I do not like the
vision/language  analogy, I wish  to adopt the top  and bottom jargon
as formal vision terminology, because it is concise and widely used.

	The vision transducer may be bidirectional 
and visual transduction is a continuous rather than a discrete process.

	1. bidirectional rather than one way.
	2. continuous rather than discrete.
	3. exact rather than fuzzy.
	4. numerical rather than linguistic.

	Computer  vision  is  the  inverse  of
computer graphics.  The problem of computer graphics is to  synthesis
images from three dimensional models; the  problem of computer vision
is  to analyze  images into  three dimensional  models.

	The vision transducer has three possible modes:
verification, revelation and recognition.

Depending on circumstances,  the vision transducer  should be able to
run  almost  entirely  top-down  (verification  vision) or  bottom-up
(revelation vision).  Verification vision is all that is  required in
a  well   know  and  consquently  predictible   environment;  whereas
revelation  vision is  required in  a brand  new or  rapidly changing
environment.
	(Bottom: the nature  of images). There are three  basic kinds
of information  in a 2-D visual image:  photometric,  geometric,  and
topological;  also there  are  four  kinds  of  2-D  images:  raster,
contour,   mosaic,  and feature.   The  traditional subject  of image
processing  involves  the  study  and  development  of  programs that
enhance,  transform  and compare  2D images.   Nearly all such  image
processing work can be subsumed into computer vision.

	(Top: the nature of worlds).  The  rules about the world that
can  be assumed  a priori by  a programmer  are the laws  of physics;
programming a simulation  of the  mundane physical world  to a  given
approximation is difficult 
	(recognition).   Recognition   involves   comparing
perceived  data with predicted  data; such  recognition comparing can
be done on any  of the four  types of 2-D images  or the 3-D  models.
Arcane  recognition  techniques  can  be  avoided  by  improving  the
prediction and the analysis so that matchs are nearly obvious.

	(locus  solving).  The  crux   of  computer  vision
is to deduce information  about the world  being viewed from
images of  that world.   I believe  that the  world information  most
directly  relevant  is  the  physical location,    extent  and  light
scattering   properties  of  solid   opaque  objects;  the  location,
orientation and scales  of the cameras  that takes the pictures;  and
the  location and  nature of  the lights  that illuminate  the world.
Accordingly,   three  central themes  of  my theory  are  body  locus
solving,   camera  solving, and  sun solving.  The macroscopic  world
doesn't  change very rapidly; between  any two world  states there is
an intermediate  world state.   Parallax  is the  principal means  of
depth  perception.   Parallax  is  the  alchemist that  converts  2-D
images  into 3-D models. Revelation vision  is a process of comparing
percieved images taken  in sequence and  constructing a 3-D model  of
the unanticipated objects.
(Computer Vision and Artificial Intelligence).
	
	At one  extreme, computer vision  may be discribed  as merely
the problem of  getting visual input hardware properly connected to a
computer; once the computer can "see" a raster of intensities  in its
memory,  the rest  of  the problem  is  artificial intelligence.  The
other extreme  is harder to depict because it requires figuring where
to draw the line between vision software and intelligence software.

	Normal  vision  should  not  be  an  Artificial  Intelligence
problem  in the  sense that  it  will not  involve searching  a large
space of possibilities or of solving an abstract problems.

"The history of progress in the development  of systems for automatic
symbolic   integration  poses  an  interesting   question  about  the
definition of artificial intelligence. Few would argue  that Slagle's
SAINT  program was  a  product of  artificial intelligence  research.
Moses'  SIN program for symbolic integration  seldom needed to resort
to search,  and for  this reason some  people consider  it much  more
powerful (intelligent ?) than  SAINT. Now, Risch (1969) has developed
an  algorithm  for  integrating  many  types  of  expressions.  Risch
considers himself  a  mathematician, not  an artificial  intelligence
researcher.  In your opinion  should Risch's  algorithm be considered
part of the subject matter of artificial intelligence ? If  you would
exclude Risch  from artifial intelligence,  how would you  respond to
the  statement  that  every  artificial  intelligence  program  might
eventually  be dominated  by  a  (more intelligent?)  non  artificial
intelligence algorithm?  If you would  include Risch, would  you also
include the long-division algorithm?"

			- Nils J. Nilsson, problem 4-5;
			Problem-Solving Methods in Artificial Intelligence.

	(Intellectual  Entities).  The larger  context  of  a  vision
theory  depends  on  ones' opinion  about  the  nature of  counscious
intelligent animals, men and  robots. It is  my opinion that mind  is
to matter,   as computer  software is  to computer hardware.  That is
mind  is a program  that is  running in the  brain.  Well  now,  what
software can account  for counsciousness, the  inner private life  of
the  self  that  burns  in  our  heads  ? The  so  called  stream  of
counsciousness consists  of little  voice(s) talking,   fragments  of
music playing, and most  important there is the flow of  the here and
now.  The "here-and-now" is  the totality of  the particular sights,
sounds,  smells,  and so  on that are  being played in  your head  in
sync with  the respective sensory  stimuli. So I believe that
the  major computation being  performed by an  intellectual entity in
order  to  stay  counscious  of  its  external  world  is  a  reality
simulation.
(Fiegenbaum Quote).

	[the relation between Artificial Intellegence, experiment,
environmental simulation].

	"The design,  implementation, and use  of the  robot hardware
presents  some   difficult,  and  often  expensive,  engineering  and
maintenance problems. If  one is to  work in  this area solving  such
problems  is   a  necessary  prelude   but,  more  often   than  not,
unrewarding  because the activity  does not address  the questions of
A.I. reseach  that motivate  the project. Why,  then, build  devices?
Why not simulate  them and their environment? In  fact, the SRI group
has done  good work  in simulating  a  version of  their robot  in  a
simplified environment. The  answer given is  as follows. It  is felt
by  the  SRI  group  that  the  most  unsatisfactory  part  of  their
simulation effort was  the simulation of  the environment. Yet,  they
say that  90% of  the effort  of the simulation  team went  into this
part  of  the  simulation. It  turned  out to  be  very  difficult to
reproduce in an internal representation for a  computer the necessary
richness of environment that  would give rise to interesting behavior
by the highly  adaptive robt.  It is easier  and cheaper  to build  a
hardware robot  to extract what  information it  needs from the  real
world  than to organize  and store a  useful model.  Crudely put, the
SRI group's argument  is that the most  economic and efficient  store
of information about the real world is the real world itself."

					- E. A. Fiegenbaum [ref. X].
The Vision Transducer.

	Grand Theory:  The structure  of any  computer vision  system
can  be  expressed as  a  transducer between  a  bottom  of perceived
images and a top, world model.

	Petit Theory:
Computer vision  is the inverse of computer  graphics. The problem of
computer graphics  is  to  synthesis images  from  three  dimensional
models; the  problem of  computer vision  is to  analyze images  into
three dimensional models.

	
(Vision loop terminolgy)

	1. PREDICT	2D ā†’ 3D		synthesis	Verification
	2. PERCEIVE	3D ā†’ 2D		analysis	Revelation
	3. COMPARE			recognition

(Discription of nearly pure top down vision)

(Discription of nearly pure bottom up vision)

Bottom: The Nature of Images.

Assumption:	Computer vision based on digitized television images.

Alternatives:	1. Active 3-D imaging device.
		2. Non-light devices: sound, radar, neutrinoes, etc.

	Although, a super intellectual entities  would have eyes that
could see the  whole electromagnetic spectrum from gamma radiation to
direct current as well  as "voices" that  could broadcast on any  and
all frequency; the video restriction

	An image contains three basic kinds of data:
topological data, geometric data, and photometric data.

	The quality of the particular computer vision system
that one is condemned to use is very likely to influence one
theories.

Visual Organ

	size of image
	photometric accuracy, bits per pixel
	resolution
	speed of image taking

Computing Organ

	central processor
	primary memory
	secondary memory

Locus Solving.
	1. Camera Locus Solving.
	2. Body Locus Solving.
		Silhouette Cone Intersection.
		Envelope bodies.
	3. Sun Locus Solving.
		(compute it, look at it, shine and shadows).

Recognition.
Top: The Nature of Worlds.

Assumption:	The world model should be a 3-D geometric model.

Alternatives:	1. Image memory and 2-D models.
		2. Procedual Knowledge.
		3. Semantic knowledge.
		4. Formal Logic models.
		5. Statistical world model.

	(On Partial Knowledge).

Assumption:	Partial knowledge should be represented by approxination.
Alternatives:	1. Tree of possibilties.
		2. Multi valued logic.
		3. Probablities.

(Alternate world models).
(Reality Simulation).

"For the purpose of  presenting my argument I must  first explain the
basic  premise of  sorcery as don  Juan presented  it to me.  He said
that for a sorcerer, the world  of everyday life is not real, or  out
there, as we believe  it is. For a sorcerer, reality  or the world we
all  know, is  only a  discription. For  the sake of  validating this
premise don Juan  concentrated the best  of his efforts into  leading
me to  a genuine conviction that what  I held in mind as  the world at
hand was merely a  description of the world;  a description that  had
been pounded into me from the moment I was born."

			- Carlos Castaneda. Journey to Ixtlan.
Related Vision Work.
		Stanford Hand/Eye
		SRI - hart & duda.
		MIT Guzman, Waltz